revisiting sparse convolutional model
Revisiting Sparse Convolutional Model for Visual Recognition
Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100 and ImageNet datasets when compared to conventional neural networks. By leveraging stable recovery property of sparse modeling, we further show that such models can be much more robust to input corruptions as well as adversarial perturbations in testing through a simple proper trade-off between sparse regularization and data reconstruction terms.
Revisiting Sparse Convolutional Model for Visual Recognition - Supplementary Material - Xili Dai
As we explain next, this is made possible by sparse modeling with CSC-layers. 2 B.1 Method We apply our visualization method described above to the SDNet-18 trained on ImageNet (see Sec. 4) The results are provided in Figure B.1. It can be observed that the shallow layers (e.g., layer 1 - 5) capture rich details of the SDNet-18 progressively remove some of the unrelated details from the network input. In Figure C.1, we provide a visualization of the learned dictionary in the first layer of SDNet-18 The result shows our CSC-layer feature maps are highly sparse. Table D.1 shows the comparison of SDNet-18/34 and SDNet-18/34-All on CIFAR-10, CIFAR-100 Both models have high accuracy performance while SDNet-18 is significantly faster. Figure B.1: Visualization of feature maps for 5 images at selected layers of a SDNet-18 trained on Figure C.1: Visualization of the learned dictionary of first layer of SDNet-18-All trained on ImageNet.
- North America > United States > Ohio (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
Revisiting Sparse Convolutional Model for Visual Recognition
Despite strong empirical performance for image classification, deep neural networks are often regarded as black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100 and ImageNet datasets when compared to conventional neural networks.